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The 

RICIS 

Concept 


The University of Houston-Clear Lake established the Research Institute for 
Computing and Information systems in 1986 to encourage NASA Johnson Space 
Center and local industry to actively support research in the computing and 
information sciences. As part of this endeavor, UH-CIear Lake prop 
partnership with JSC to jointly define and manage an integrated program of research 
in advanced data processing technology needed for JSCs main missions, including 
administrative, engineering and science responsibilities. JSC ag reed and entered into" 
a three-year cooperative agreement with UH-CIear Lake beginning in May, 1 986, to 
jointly plan and execute such research through RICIS. Additionally, under 
Cooperative Agreement NCC 9-16, computing and educational facilities are shared 
by the two inst itutio ns to condu ct the research. 


The mission of RICIS is to conduct, coor dinate and diss eminate research oti : \ 

computing and information systems among researchers, sponsors and users from 
UH-CIear Lake, NASA/JSC, and other research organiz ations. Within UH- CIear . - 
Lake, the mission is being implemented through interdisciplinary i nvolveme nt of — 
faculty and students from each of the four schools: Business, Education, Human ^ 1 — j 
Sciences and Humanities, and Natural and Applied Sciences. 

Other research organizations are involved via the “gateway” concept. UH-CIear 


Lake establishes relationships _with other universities and res earch organiza tioBSr^ ^W 
having common research interests, to provide additional sources of expertise '""'“y 
conduct needed research. 

A major role of RICIS is to find the best match of spons ors, resear ch ers and 
research obj ectives to advance k nowledge in the computing and information^ 
sciences. Working jointly with NASA7JSC, RICIS advises on research needs, 
recommends principals for conducting the research, provides technical and 
administrative support to coordinate the research, and integrates technical re sults 
into the cooperative goals of UH-CIear Lake and NASA/JSC^=r — — 
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Preface 


This document constitutes the fifth delivery, “Revised Final Report,” of the five deliveries scheduled for the 
first phase of RICIS contract 069, “Verification and Validation of Expert Systems Study.” 

This delivery consists of an update to the final report which was delivered on September 14, 1990. The 
revisions are due to new survey responses received, interviews, and review comments that were received. 
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Background 

The purpose of this task is to determine the state-of-the-practice In Verification and Validation (V&V) of 
Expert Systems (ESs) on current NASA and Industry applications. This is the first task of a series which 
has the ultimate purpose of ensuring that adequate ES V&V tools and techniques are available for Space 
Station Knowledge Based Systems development. 

The strategy for determining the state-of-the-practice is to check how well each of the known ES V&V issues 
are being addressed and to what extent they have impacted the development of Expert Systems. 

Note; This task does not attempt to prove or disprove whether Verification and Validation can or should be 
performed on Expert Systems. It is accepted that Verification and Validation should be applied to all soft- 
ware systems, including Expert Systems. 
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Executive Summary 


Data from over sixty Expert System (ES) projects was collected through a written survey and/or interviews. 
Forty basic questions were asked, ranging over a variety of general topics such as the size of the ES and the 
difficulty in specifying requirements. However, all the questions were designed to gather information about 
different aspects of V&V. Significant results include the following points (see "Summary of Results” on 
page 8 for the actual percentages): 

1. In most cases, the ES was expected to be at least as accurate as the expert but often the ES was less 

accurate. ^ 

2. All users estimated the ES to be less accurate than expected while half the developers estimated the ES 
to be less accurate than expected. 

3. Less than half the systems had a requirements document. 

4. On average a quarter of the developers time was spent on V&V. 

5. While developers thought evaluating an expert system was of average difficulty, users unanimously 
thought it was hard. 

6. All V&V techniques were used, with each technique being relied upon, by at least one project, as the 
sole V&V technique used. 

7. The most often cited V&V problems were test coverage determination, knowledge validation, and 
problem complexity. 

Based on an analysis of the survey results, several recommendations were formulated. These recommen- 
dations are: 

L Develop suggested V&V requirements for ESs, that is, standard and guidelines V&V of ESs at each stage 
of development. 

2. Address the test coverage determination, knowledge validation, and problem complexity issues. 

3. Develop ways to make knowledge bases more easily modularized and easier to understand. 

4. Address the configuration management of expert systems. 

5. Develop criteria to classify an ES by intended use so that V&V requirements can be tailored to different 
types of ESs. 

6. Investigate ways to assist an expert in analyzing a knowledge base, possibly either through the use of 
analysis tools or higher level representations. 
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Survey Rationale 

It is widely claimed that Expert Systems have been not been subject to the same level of Verification and 
Validation as traditionally developed software. Some people feel that this lack of V&V continues because of 
a 'vicious circle,' where nobody requires expert system V&V, so nobody does it. Consequently, since 
nobody knows how to do it, nobody requires it. There are two major reasons why the V&V process has not 
been documented: lack of a single life-cycle model, and technical differences between traditional software and 
expert systems. 

Most expert system development life-cycles rely on iterative prototypes to develop the system behavior. This 
approach does not lead to methodical capture and documentation of the expected system behavior. Docu- 
mented expectations, traditionally captured in a requirements document, are essential in the V&V process: 
you can't do testing if you don't know what to test for! One goal of this survey is to understand how the 
expected behavior of current expert systems is communicated and evaluated, even if a formal requirements 
document was not developed. 

Expert Systems are typically composed of three parts: the knowledge base (KB), the inference engine, and 
the interface code between the inference engine and the peripheral devices (terminals, sensors, effectors, users, 
etc.). The inference engine and interface code are simply traditional software and should currently be 
V&Ved by accepted practices. This survey will help determine if these parts are V&Ved or whether, since 
they are part of an expert system, V&V is overlooked. 

The knowledge base is the only part of the Expert System that raises new and unique issues. A set of the 
possible issues are: 

Issues primarily due to use of nonprocedural languages 

• Understandability and readability to support inspections 

• Testing coverage 

• Standard validation tests for inference engines 

• Real-time performance analysis 

Issues due to heuristic knowledge (difficulty in organizing) 

• Knowledge validation 

• Modularity/ Design 

Issues primarily due to solving new complex problems 

• Requirements 

• Certification 


Other issues 

• Uncertainty Analysis 

• Inheritance Process Test and Analysis 

• Configuration Management 

One of the purposes of this survey is to find out if these identified possible issues actually cause problems in 
practice, and if so, how the issues are being handled. 
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Purpose of the Questionnaires 


Some of the information for this survey can be captured fairly easily and is accomplished through use of a 
questionnaire. The information captured this wdy includes: , 

• Application information - What kind of problem does the system address?, What are the performance 
goals? 

• Expertise information - What was the relationship between the developers and expert(s)?, What is the 
performance level of the expert? 

• Development information - How was the system developed?, How big is the system? 

• Evaluation information - How was the system evaluated? 

• Performance information - How important is good performance?, How well is the ES performing? 


Purpose of the Interviews 

The questionnaire answers lead to an additional set of questions involving the V&V issues described earlier. 
The additional questions are greatly affected by the answers provided in top questionnaire, so it would be 
more efficient to derive the information through direct interviews than to generate a large number of sec- 
ondary questionnaires. The interviews attempt to uncover: 

• the real issues involved in ES V&V (in comparison with the known possible issues outlined above). 

• what is being done currently to address V&V (inspections, path testing, testing by the expert). 

• what makes users trust the ESs, if the ESs are indeed trusted. 

• what problems, unique to ESs, were encountered and possibly addressed during development and test. 
The interviews are also required because we expect that some people will not fill out the questionnaires. 
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Survey Administration 

This survey was designed so that the majority of the information would be gained from direct interviews 
with people involved in ES projects. Several people from each project, including developers, users, and man- 
agers, were interviewed to get a realistic view of the projects. 

Several other activities were undertaken, both before and after the interview activity, to ensure that the 
results of the survey reflected the actual "state-of-the-practice". These activities included: 

Identifying candidate ES projects 

A list of projects to be contacted was created. The list included projects at NASA and IBM as 
well as projects from fields outside of the space industry. 

Developing survey questionnaire^) 

To improve the chances of getting meaningful data from the questionnaire activity, separate ques- 
tionnaires were developed for developers and. users. Each questionnaire includes a question to 
indicate if the answers are from a manager or non-manager. Questionnaires are listed in 
Appendix B, “Expert Systems Evaluation Questionnaire (Developer)” on page 36 and 
Appendix C, “Expert Systems Evaluation Questionnaire (User)” on page 44. 

Evaluating returned questionnaires 

Each questionnaire was evaluated to determine if project interviews would uncover more infor- 
mation. If a project was to be interviewed, the questionnaire results provided guidance on which 
topics would be the most useful to explore. 

Summarizing interview/questionnaire results 

The summarized results of the questionnaire/interview activities are presented in section 
"Summary of Results” on page 8. 

Recommendations 

Recommendations for further action, based on the information in “Summary of Results” on 
page 8 are provided in section “Recommendations” on page 22. 
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Survey Questionnaires 

Different versions of the questionnaire were developed for developers and users of the expert system. In 
addition, responses were expected to be different between managers and non-managers, so an indication is 
included on each questionnaire. 


Information Gathered 

Several types of information are captured by the questionnaire. Each question in the questionnaire addresses 
at least one of the previous types of information. For each type of information, the subtopics and questions 
which provide information are listed. The question numbers are noted as (development question, user ques- 
tion). Questions not available on a questionnaire are indicated by a 

General Information 

Describes the general properties of the expert system, including the name (I, 1), a short 
description (4, 4), field of the problem (5, 5), and the type of problem to be solved (6, 6). Also 
captured are whether the survey taker was a manager (2, 2). 

Performance Criteria 

A major expertise issue is performance (probability that the results given are correct); specifically 
performance of the experts (10, 9), expected performance of the system (II, 10), and actual per- 
formance of the system (12, 1 1). Related to the performance issue is the amount of the problem 
space that the ES is expected to cover (8, 7), and that it actually covers (9, 8). 

Requirements Definition 

Requirements definition information includes how the requirements are documented (13, -), the 
difficulty in determining the requirements (14, -), and the availability of the expert(s) to resolve 
requirements issues during development (17, -). Influencing the performance issue is the number 
of experts (15, -), and whether the experts agree on the results obtained from the system (16, 21). 
It may also be useful to know if the expert (-, 12) and/or the developerfs) (18, 13) are part of the 
user organization. 

Development Information 

Development information that we are concerned with includes the development life-cycle used 
(19, -), and what languages and tools were used to develop the system (20, -). The size of the 
system (22, -), the total effort required for development, (29, «), and the effort required to develop 
the different parts of the ES (21, -) indicate the difficulty of the development effort. The sensi- 
tivity of the system (24, -) will influence the difficulty of future maintenance activities. 

V&V Activities Performed 

The major information to be captured during this task is the current state-of-the-practice for 
V&V of ESs, including the kinds of V&V being attempted, both during (28, -) and after (33, 20) 
development, and how much of the development effort was spent on V&V (30, -). Detailed 
information is also gathered for V&V activities for Knowledge Structures (25, -), the Inference 
Engine (26, -), and the Interface Code (27, -). 

Information about the difficulty of the V&V effort (35, 22), whether a separate group performed 
V&V, (31, -) and how much effort was expended on the independent V&V (32, 19), is also gath- 
ered. 

Whether the system is operational or prototype (3, 3), and the criticality of the system (37, 15) 
have an affect on the amount of V&V activities performed. 

V&V Issues Encountered 

If the state-of-the-practice is to be improved, the major issues that need to be addressed must be 
identified. One question (36, 23) directly asks whether each the known issues was actually 
encountered. Additional questions find out more information about specific issues, including the 
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existence of certainty factors (7, -), whether configuration management was performed (34, -)» 
and the difficulty of implementing the expertise through the Knowledge Structures (23, -). User 
acceptance is the ultimate test of the V&V activities. The comparison between expected system 
use (39, 17) and actual system use (40, 18), the perceived reliability of the system (38, 16), and 
why the user is convinced that the system produces correct results (-, 14) are all indicators of user 
acceptance. 


Human Factors 

The questionnaires were designed to capture as much accurate information as possible. In an effort to 
accomplish this, the following human factors issues were taken into account: 

Questions should be understandable 

Questions should have as few 'technical* terms as possible to avoid confusion due to local usage. 
For questions that must have technical content, be sure to provide sufficient explanation. 

Choices worded positively 

Negatively worded choices may not get selected because the responder may feel there is some- 
thing wrong with it. 

Meaningful questions 

The responder should feel that there is some purpose to the question. 

Make use of fill-in-the-blank questions 

The responder should not have to fill in long responses. Some questions can not have all pos- 
sible responses enumerated, so the user should be able to specify his own choice. 
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Summary of Results 


I 


The survey results are summarized in the following sections. The results are organized according to the type 
of information, as organized in “Information Gathered” on page 6. The percentages in parentheses corre- 
spond to the results from the developer and user questionnaire, respectively. If the question is not in one of 
the questionnaires, the position is filled with a 

General Information 

Most of the respondents were involved with Expert Systems which perform Diagnosis 
(45 /o,80 /o), primarily in the Aerospace field (46%, 100%). The survey respondents were pre- 
dominantly involved with development (93%). 

Performance Criteria 

(37%, 40%) estimated an actual accuracy of less than 90% and (48%, 60%) estimated an accu- 
racy of less than 95%. Most (60%, 40%) estimated the problem space coverage between 60% 
and 95%. In comparing the accuracy of the expert and the expert system, most expected the 
expert system to at least as accurate as the expert (78%, 80%) while the expert system often was 
estimated to be less accurate than expected (49%, 100%) and less accurate than the expert 
(44%, 80%). Note that the results show that users more often (than developers) cited the system 
as being less accurate than expert and less accurate than expected. 

Requirements Definition 

(75%,-) indicated that expert consultation was a basis for determining the behavior of the system. 
More revealing is that (52%,-) said there were not any documented requirements and (43%,-) 
indicated that prototypes or similar tools were used for requirements. 

(40%,-) had medium difficulty in generating requirements while (35%,-) said they were hard and 
(25%,-) said they were easy. (58%,-) of developers had a high level of contact with experts 
during development. 

Development Information 

The most frequent (40%,-) Life-Cycle model used is the Cyclic Model (repetition of Require- 
ments, Design, Rule Generation, and Prototyping until done) j however, (22%,-) of the respond- 
ents stated that no model was followed. Most development was done with an Expert System 
shell (CLIPS and others), and the predominant Interface Code was C and LISP. Applications 
were reasonably large, requiring an average of 33 person/months to develop. Developed systems 
were not reported to be particularly sensitive to change; (77%,-) said changes only occasionally 
caused an unexpected behavior. 

V&V Activities Performed 

Most V&V activities relied on comparison with expected results and expert checking. Typically, 
(24%,-) of the development effort was spent on V&V. While developers seemed to feel V&V was 
of medium difficulty, users unanimously agreed that it was hard; (34%, 0%) said it was medium 
while (27%, 100%) said it was hard and (33%, 0%) said it was easy; (5%,0%) said it was impos- 
sible. Of significant interest is the fact that each V&V technique was used as the sole V&V tech- 
nique in at least one project. Also, in general, there was wide ranging uses of V&V techniques 
(39%, 20%) of the respondents indicated that the ES was a prototype system. 

V&V Issues Encountered 

The known issues most often cited as problems were: test coverage determination ( 50 %, 75 %), 
knowledge validation (44%, 75%), problem complexity (39%, 40%), and real-time performance 
analysis (40%, 25%). (Note that as a whole, the developers ranking of the issues agreed with t he 
users ranking of the issues). The least cited problem was analysis of certainty factors (only seven 
respondents indicated that certainty factors were used). Every known issue was cited by at least 
one respondent. 
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Configuration management practices are reported to be an issue for many participants, regardless 
of whether the system was operational or a prototype. 

The expected system use varied widely (3-2000), while actual system use was relatively good (less 
than half of the respondents provided information, suggesting that actual use was much lower 
- than reported). 

The following sections list the results from each individual question. The total number of responses is given 
for each question along with the number of times each choice was selected ( given to the left of the choice). 


General information 

The questions for the name of the ES, and the short description are not reported. 

Field of the Problem 

Question Numbers: 5, 5 
Total Responses: 70 

What field does the problem belong to? 

35 Aerospace 
_4 Financial 
J2 Information Systems 
_8 Hardware 
_6 Manufacturing 
_2 Marketing 

Medical 

_1 Personnel 
_2 Research 
_1 Service 
_4 Software 
_5 Other 

Type of Problem Solved 

Question Numbers: 6, 6 
Total Responses: 70 

Which of the following items best describes the kind of problem the Expert System addresses? Please indi- 
cate primary purpose with a and check all other applicable purposes (if any). 

Note: The number of times the choice was selected as primary purpose is given in parentheses after the 
number of times the choice was selected. 

13 (1 1) Design - Configuring objects under constraints 
1 1 (_0) Repair - Executing plans to administer prescribed remedies 
1 1 (_5) Control - Governing overall system behavior 
16 (_5) Planning - Designing actions 

34 (23) Diagnosis - Inferring system malfunctions from observables 

1 1 (_1) Debugging - Prescribing remedies for malfunctions 

(_ 3) Prediction - Inferring likely consequences of given situations 
23 (_8) Monitoring - Comparing observations to expected outcomes 

12 (_1) Instruction - Diagnosing, debugging, and repairing behavior 

15 (_5) Interpretation - Inferring situation descriptions from sensor data 
_5 (_2) Classification - Categorizing objects by properties 
_3 ( ) Others 
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Role on Project 

Question Numbers: 2, 2 
Total Responses: 70 

Were you a developer of the Expert System the manager of the, development organization, a user of the 
Expert System, or the manager of a department which uses the Expert System? 

42 Developer of Expert System 

_6 Manager of Expert System development organization 
17 Other Development 
_4 User of the Expert System 

Manager of a department using the Expert System 

1 Other User 


Performance Criteria 


Performance of the Experts 

Question Numbers: 10, 9 
Total Responses: 70 

If human experts currently perform (or previously performed) the task, how often is the expert(s) expected to 
give the correct answer? 

_2 Task not performed by human 
17 'Correct' defined by expert 
19 > 99% 

16 95% to 99% 

4 90% to 95% 

_4 80% to 90% 

_1 60% to 80% 

_ 40% to 60% 

_4 Other (2 - 100%) 

3 I don't know 


Expected Performance of the System 

Question Numbers: 11, 10 
Total Responses: 70 

How often is the Expert System expected to provide the correct answer? 

22 100% 

16 > 99% 

_9 95% to 99% 

10 90% to 95% “ 

_4 80% to 90% 

_3 60% to 80% 

_ 40% to 60% 

_1 Other 
5 I don't know 
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Actual Performance of the System 

Question Numbers: 12, 11 
Total Responses: 68 

What is your estimate of how often the Expert System actually provides the correct answer? 

II 100% 

11 > 99% 

12 95% to 99% 

10 90% to 95% 

_8 80% to 90% 

_5 60% to 80% 

1 40% to 60% 

_3 Other (< 40%) 

_7 I don't know 

Expected Problem Space Coverage 

Question Numbers: 8, 7 
Total Responses: 70 

How much of the problem space is the Expert System expected to cover? 

15 100% 

12 > 99% 

_6 95% to 99% 

1 90% to 95% 

13 80% to 90% 

_4 60% to 80% 

_4 40% to 60% 

_4 Other 

5 I don't know 


Actual Problem Space Coverage 

Question Numbers: 9, 8 
Total Responses: 70 

What is your estimate of the problem space coverage actually provided by the Expert System? 

_4 100% 

~3 > 99% 

~8 95% to 99% 

_3 90% to 95% 

14 80% to 90% 

19 60% to 80% 

_8 40% to 60% 

1 Other (1 - 5%) 

8 I don't know 
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Requirements Format 

Question Numbers: 13, - 
Total Responses: 62 

What was the basis for determining how the system was to behave? Please indicate the primary basis with a 
and check all other applicable basis (if any). 

Note: The number of times the choice was selected as primary basis is given in parentheses after the 
number of times the choice was selected. 

12 (_4) A pre-existing document 

19 (_4) A requirements document completed as part of development. 

_6 ( ) Some other developed document 

27 (_4) A prototype of the system 
49 (38) Expert consultation 
-6(_) 


Requirements Difficulty 

Question Numbers: 14, - 
Total Responses: 63 

How difficult was it to develop the original concept of what the system was supposed to do? 

_1 Trivial 
15 Easy 
25 Medium 
15 Hard 
_1 Impossible 

Availability of the Expert(s) 

Question Numbers: 17, - 

Total Responses: 53 

If the system was not developed by the expert, how much interaction was there between the expert(s) and 
the development team? 

_6 System was developed by expert 
10 Constant 
15 Frequent 
17 Regular 
_5 Occasional 
None 


Number of Experts 

Question Numbers: 15, - 
Total Responses: 64 

Was more than one expert consulted during the development of the system? 

10 System was developed by expert 
_6 Single expert 
30 Multiple experts with lead 
12 Committee of experts 
6 Other 


Summary :>f Results 1 2 


Revised Final Report 


Agreement Among Experts 

Question Numbers: 16, 21 
Total Responses: 61 

If more than one expert was available for consulting, how often did the experts agree on what results the 
Expert System was supposed to provide? 

_6 A single expert was involved 
1 1 Always agree 

44 Agree 75% of the time (range 30%-99%) 

Expert in User Organization 

Question Numbers: 12 

Total Responses: 5 

Was the expert(s) a member of the user organization? 

_5 Yes 
No 

User organization provided some expertise 

Developers in User Organization 

Question Numbers: 18, 13 
Total Responses: 69 

Was the developerfs) of the Expert System part of the user organization? 

25 Yes 
31 No 

13 Some development provided by user organization 


Development Information 

Development Life-Cycle Used 

Question Numbers: 19, - 
Total Responses: 58 

Please indicate which development model was used for developing the Expert System. 

_5 Requirements gathering preceded Design, Implementation, and Test (Traditional waterfall life-cycle). 
12 Requirements gathered before development of a prototype. A second requirements activity preceded 
Design, Implementation, and Test. 

25 Repetition of the Requirements, Design, Rule Generation, and Prototyping phases until production 
system (final prototype) was developed. 

14 No effort was made to follow a particular model. 

2 Other 
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Languages and Tools Used 

Question Numbers: 20, - 
Total Responses: 64 

What was the primary language/tool For the knowledge structures ? 

Note: The most frequent languages/tools are reported after the choice as: “frequency - language, 'tool.” 
Knowledge Structures (17 - ESE, 13 - CLIPS, 10 - LISP, others) 

Size of the System 

Question Numbers: 22, - 
Total Responses: 39 

Since Knowledge Bases can be written using several type of Knowledge Structures, please indicate how many 
of the following structures were used. If another type of structure was used, please describe it and how many 
were used. 

Note: The number of times that a value was given for each choice is provided in parentheses followed by 
the average value for that response. The range of the responses is given in parentheses after each choice. 

(35) 235 Rules (range 30-1000) 

(15) 872 Frames (range 1-10000) 

(10) 248 Facts (range 50-800) 

(15) 121 Parameters (range 20-400) 

( 2) 8K Statements (2K - 16K) 

Total Development Effort 

Question Numbers: 29, - 
Total Responses: 57 

How much effort was expended in developing the system, including evaluation activities performed by the 
developers? 33 (range 1-200) person/months. 

Detailed Development Effort 

Question Numbers: 21, - 
Total Responses: 64 

What percentage of the total development effort was dedicated to each part of the Expert System? 

61 % Knowledge Structures 
8 % Inference Engine 
31 % Interface Code 

System Sensitivity 

Question Numbers: 24, - 
Total Responses: 64 

When changes were made to the knowledge structures, how often did some unexpected result occur? 

_5 Never 
44 Occasionally 
_9 Frequently 
_5 Usually 
_1 Always 
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V&V Activities Performed 


m 






V&V Activities during development 

Question Numbers: 28, - 
Total Responses: 63 

What testing activities were performed on the executing system? (indicate any that apply) 

_2 No evaluation was performed 

38 Checked by expert(s) 

32 Compared with expected results 

28 Structural testing (e.g. cover all rules) 

18 Other 

V&V Activities after development 

Question Numbers: 33, 20 
Total Responses: 47 

What testing activities were performed on the executing system before the system was delivered to the users? 
(indicate any that apply) 

_1 No evaluation was performed 

33 Checked by expert(s) 

39 Compared with expected results 

29 User acceptance 

16 System run in parallel 
_5 Other 

Development effort was spent on V&V 

Question Numbers: 30, - 
Total Responses: 62 

How much of the development effort was spent on evaluation? 24 % (range 2%-80%) 

V&V of Knowledge Structures 

Question Numbers: 25, - 
Total Responses: 65 

What evaluation activities were performed on the Knowledge Structures? (indicate any that apply) 

_3 No evaluation was performed 
28 Desk checking 
1 5 Formal inspections 
42 Checked by expert(s) 

39 Structural testing (e.g. cover all rules) 

9 Other 
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V&V of Inference Engine 

Question Numbers: 26, - 
Total Responses: 35 

What evaluation activities were performed on the Inference Engine? (indicate any that apply) 

17 No evaluation was performed (ES shell was used) 

_2 No evaluation was performed 

_3 Desk checking 
10 Formal inspections 
_5 Structural testing 
Other 

V&V of Interface Code 

Question Numbers: 27, - 
Total Responses: 58 

What evaluation activities were performed on the Interface Code? (indicate any that apply) 

_7 No evaluation was performed 
25 Desk checking . 

12 Formal inspections 
29 Structural testing (branch or path) 

18 Experts 
Other 

Difficulty of V&V 

Question Numbers: 35, 22 
Total Responses: 67 

Compared to conventional software testing efforts, how difficult was the evaluation of the Expert System? 

_3 Trivial 

16 Easy 

20 Medium 

20 Hard 

_3 Impossible 

_4 No evaluation was done 

Separate V&V group 

Question Numbers: 31, - 
Total Responses: 62 

Did a separate organization evaluate the Expert System before it was delivered to the users? 

15 Yes, there was a separate evaluation organization. 

47 No, there was not a separate evaluation organization. 
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•• Independent V&V Effort 

Question Numbers: 32, 19 
Total Responses: 11 

If there was a separate evaluation team, how much effort was expended by the team in evaluating the cor- 
rectness of the Expert System? 

(11) 3 (range 1-7) person/months reported by developers 
(3) 16 (range 3-24) person/months reported by users 

Operational or Prototype System 

Question Numbers: 3, 3 
Total Responses: 70 

Is the Expert System operational or is it a prototype? 

42 Operational system 
25 Prototype system 
_3 Operational prototype (write in) 

System Criticality 

Question Numbers: 37, 15 
Total Responses: 69 

How reliable is the Expert System required to be? 

_7 Trusted with human life 
15 Trusted with mission objectives 
31 As reliable as the expert 
17 Assists the expert 
19 Assists the user 
Other 


V&V Issues Encountered 

Known Issues Actually Encountered 

Question Numbers: 36, 23 
Total Responses: 66 

Many people feel that some development issues are more of a problem with Expert Systems than with con- 
ventional systems. Which (if any) of the following were problems during implementation or test of this 
Expert System? 

1 3 Understandability and readability of knowledge structures 
34 Determining test coverage for knowledge structures 

19 Modularity/ Design of knowledge structures 
30 Knowledge validation 
_6 Analysis of Certainty Factors 
_8 Validating the inference engine 
26 Real-time performance analysis 
26 Complexity of the Problem 

14 Certification 

_9 Configuration Management 
6 Other 
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Certainty Factors 

Question Numbers: 7, - 
Total Responses: 64 

Does the Expert System include certainty factors? 

1 Yes 
54 No 

_3 I don't know 

Configuration Management 

Question Numbers: 34, - 
Total Responses: 45 

How were changes to the Expert System distributed to the users? 

_5 User updated system at developer's direction 
18 Developers made changes to users' system 
_1 Untested system distributed to users 
22 Tested system distributed to the users 
_3 Configuration management group distributes system 
1 Other 


Expertise Implementation Difficulty 

Question Numbers: 23, - 
Total Responses: 62 

Aside from any difficulties in developing the original concept, how difficult was it to express the behavior 
(through the Knowledge Structures) of the expert? 

_3 Trivial 
16 Easy 
20 Medium 
20 Hard 
_3 Impossible 


Expected System Use 

Question Numbers: 39, 17 
Total Responses: 50 

How many people are expected to make use of the Expert System? 219 (range 1-2000) 


Perceived System Reliability 

Question Numbers: 38, 16 
Total Responses: 68 


Does the Expert System seem to be more reliable or less reliable than conventional systems that arc in use 1 

_9 Significantly more reliable 
16 More reliable 
_3 Slightly more reliable 
19 Similar reliability 
_2 Slightly less reliable 
_1 Less reliable 
Significantly less reliable 
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14 No comparison is available 
_4 I don't know 

User Trust 

Question Numbers: 14 

Total Responses: 5 

Why do you believe the results that the system gives? 

_1 Expert says it is correct 
_3 Participated in evaluation 
_ Someone I trust did evaluation 
_5 Personal use and checking 
_1 User acceptance 

I don't trust the results 

Other 
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Summary of Interview Results 

In addition to acquiring written responses to the survey questions, interviews were performed to gather addi- 
tional data and to clarify questions concerning the written responses. Additional information from these 
interviews are summarized in this section. 

Structural Testing; Based on the survey results, a commonly used evaluation approach was the use of 
structural testing. This was surprising because it was felt that structural testing was relatively difficult to apply 
to expert systems. From the interviews, we learned that although some projects did attempt to measure the 
actual test coverage (i.e., percentage of rules executed during testing) many others did not actually measure 
the coverage. Instead, they attempted to develop test cases that would cover all of the knowledge base (or at 
least the important parts) but made no attempt to measure how well the knowledge base was actually 
covered. Also, there appeared to be no attempt to cover interactions between knowledge base elements (e g., 
rule interactions); each element was tested as if it were an independent piece of the knowledge base. Some 
knowledge base developers felt that more formal structural testing would be too much effort and would 
hinder the development process too much. In conclusion, it seemed that, although structural testing was 
used, it was a very weak form of structural testing (at least compared to, say, branch coverage in procedural 
software testing). 

Experts Developing Expert Systems; It appeared that the expert was heavily relied upon to aid in evalu- 
ation of the knowledge base; this subject was probed more deeply during the interviews. It seems that a close 
interaction between the expert and the knowledge base developer was mandatory to successfully develop an 
expert system. This is not a surprising result and it has been discussed at length in the literature. However, it 
was surprising to learn that many knowledge base developers feel that this interaction is so important that 
they think the best approach is simply to have the expert develop the system. However, one non- 
programmer interviewee, who felt that his group was being successful at having experts develop their own 
systems, also thought that this approach would have to altered to some extent in order to be successful at 
the more sophisticated types of expert systems that they would be developing in the future. 

Requirements Writing and the Conventional Software Life-Cycle: It was anticipated that expert systems 
were being developed using a much more iterative and less structured life-cycle than the conventional and 
rigid waterfall model. And, although the subject of life-cycle models was not intentionally addressed during 
the interviews, it often came up when discussing requirements. It seems that several respondents associated 
“requirements” with the conventional waterfall model and they felt very strongly that the conventional 
approaches to software development, such as the waterfall model, were much too formal and structured for 
expert systems development - that is, it would be disastrous to apply them to expert systems. Though for 
some, this feeling extended to requirements, others simply used a different approach to requirements. For 
example, in some cases, requirements were not written because it was felt that a requirements document was 
a formally written paper document that needed to be "approved” before development could proceed. While 
in other cases, an iterative prototyping development effort took place and was followed by documenting 
system requirements; these requirements were then used to test the system to ensure that it worked as 
everyone thought it (supposedly) did. 

Prototypes vs. Operational Systems: Although we attempted to get respondents to state that their system 
was either “a prototype” or “operational,” we received indications that this distinction was not easy to make, 
in practice. For example, responses included “it is both a prototype and operational,” or "it is an opera- 
tional prototype,” or “it is just a prototype but we hate many users.” It seems that some systems arc ori- 
ginally intended to be a prototype but become used operationally. Some intentionally approach the 
development of an operational system by first developing a "prototype” and once the prototype is 
“certified,” it is considered “operational.” However, there is a danger that a prototype will be used as if it 
were operational. Some have made efforts to ensure that a system that was only intended to be a prototype 
system was not accidentally relied upon in an operational setting. 
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Real-Time Performance Analysis: It was intended that "real-time performance analysis” would refer to the 
ability to predect the response time for an expert system. That is, the ability to analyze the time performance 
of the system. However, from the interviews we learned that many interpreted “real-time performance anal- 
ysis” to mean the ability to get the system to run as fast as desired/necessary. 


Issues Independent of A System Being an Expert System 

An important, but difficult, aspect of analyzing expert system development methodology is distinguishing 
properties of expert systems that are significantly different from properties of conventional software. This is 
also an important aspect of the analysis of this survey of V&V issues. Several comments appeared to be due 
more to factors other than the fact that the system being developed was an “expert” system. The interviews 
helped clarify this issue which the remainder of this section discusses. 

Extensive Use of Prototyping and Rapid Development: The conventional waterfall life-cycle model has 
proven to be ineffective for conventional software development so it is no surprise that developers do not 
want to use it for expert system development. A more iterative model (e.g., the sprial model) that includes 
the use of rapid prototyping is being perceived as a better alternative to the waterfall model. “Conventional” 
software development project often include the use of prototyping, developing better user interfaces, having 
more user involvement during development, or having developers better understand the problem domain; 
these are not issues or approaches that are unique to expert system development. 

Small/Simple vs. Large/Complex Systems: Although some of the systems surveyed are fairly large (e.g., 
200 personmonths), they are generally much smaller than dedicated software development projects (e.g., 
Shuttle MCC, Shuttle flight software, etc.). The systems surveyed seem to be isolated efforts to develop off- 
line applications for niches for which expert system technology was felt to be very suitable. That is, they 
were not systems that are not a part of larger software system; though they are often used in conjuction with 
a large data processing system (e.g., they receive real-time data from a large data processing system). This 
allowed the expert system developers to work without many of the constraints imposed on larger systems 
(e.g., tightly controlled configuration mangagement). 

Addressing a Knowledge Engineer Instead of a Programmer: Although we did not intend to gather infor- 
mation on the experience and background of individual expert system developers, we did learn that several 
respondants involved in developing expert systems are experts in a problem domain and do not have much 
programming experience. This fact will be important when considering recommendations (see 
“Recommendations” on page 22); that is, the recommendations should not assume first-hand knowledge of 
conventional software V&V techniques. 

Summary: It may be the case that the above issues are indeed typical of expert system development 
projects and that they should be addressed when addressing V&V of expert system problems. However, it 
should be recognized that they are somewhat different than the other issues that are true of all expert systems 
regardless of their size and who is developing them. This may point to a need to tailor suggestions for V&V 
of expert systems to considerations such as the size of the expert system, the experience of the developer, 
whether the system is embedded in a much larger software system, etc. 
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Recommendations 

The recommendations from the survey results are separated into two categories: 

Direct Recommendations 

Recommendations in this category are directly supported by the survey results. These recomm- 
endations include: - 

• Develop Requirements for Expert System Verification and Validation 

• Address Most Often Encountered Issues 

• Recommend a Life Cycle for Expert Systems Development 

Inferred Recommendations 

Recommendations in this category can be inferred from the survey results by analyzing relation- 
ships among the responses. These recommendations include; 

• Address Readability and Modularity Issues 

• Address Configuration Management Issue 

■ Develop Criteria to Classify Expert Systems by Intended Use 

• Investigate Applicability of Analysis Tools 

Following each general recommendation is an explanation of what was observed in the survey results. After 
this explanation is a list of specific recommendations which address all the observations. Each specific 
recommendation in the Direct Recommendations” section is followed by a lis t of supporting phrases from 
"Summary of Results” on page 8. 


Direct Recommendations 

Develop Requirements for Expert System Verification and Validation 

The major goal of this survey task was to discover and document the current state of the practice in Verifica- 
tion and Validation of Expert Systems. Based on the survey results, it appears that much can be done to 

improve the practice. The lack of requirements for performing V&V on ESs was manifested in several 

forms: ______ 

• The V&V activities performed were very inconsistent, ranging from none to very many, and the sets of 
activities performed were very diverse. 

• The reliance on expert consultation as the only source of requirements was extremely high. 

• The reliance on experts to perform V&V activities on the knowledge base, interface code, and executing 
systems was very high. 

• The low performance levels for many of the expert systems was surprising. Although it is not known 
what is acceptable reliability for the systems that were surveyed, often the estimated actual reliability was 
less than the expected reliability. Also, it is unlikely that conventional software systems that exhibited a 
similar level of performance would gain wide acceptance. (For example, many reported that the ES 
provides the correct answer less than 90 % of the time. Most conventional software reliability is rated as 
a series of '9's, e.g., 4 '9's means the correct answer is given > 99.99 % of the time.) 

• In those cases where the expected behavior of the system was not strictly defined by expert consultation, 
a large number of systems relied on prototypes. This is significant because prototype systems receive less 
V&V than operational systems, but are then used to define the behavior of operational systems. 

Each of the above observations can be directly attributed to three factors: 

1. There is a general lack of understanding on how to V&V ESs. The wide ranging use of V&V 

approaches (e.g., each technique being used as the so : e technique by at least one project) indicates that 
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there is no clear approach to V&V. That is, it is not known what V&V activities are to be performed, 
when the activities should be performed, or how the activities can be accomplished. This could, in part, 
be due to the software experience level of some of the developers. 

2. There is little understanding of how requirements for an ES should be generated and documented. It 
could be argued that this is a development issue, but without documented expected behavior, there is no 
possibility of performing adequate V&V. 

3. A large number of expert systems are prototypes for which V&V receives little consideration. 

Recommendations 

1. Develop recommendations and/or guidelines for Verification and Validation of Expert Systems. (Since 
such a significant amount of research has been devoted to V&V of traditional software, it may be appro- 
priate to approach this task as a set of modifications to current conventional software V&V require- 
ments.) These guidelines should include the ability for customization based on system size, developer 
software experience, whether it is stand-alone or a part of a much larger system, etc. 

“75% of the respondents indicated that expert consultation was a basis for determining the behavior 
of the system.” 

"Most V&V activities relied on comparison with expected results and expert checking” 

“In most cases, there was not a separate group to perform V&V” 

2. Initial efforts to define V&V requirements should be focused on diagnostic systems, since a large 
majority of the systems surveyed performed diagnostic services. 

“Most ... perform Diagnosis (45%, 80) ...” 

3. Research the process of converting prototype ESs into operational systems. A large number of respond- 
ents indicated that they were either building prototypes for later conversion into operational systems, or 
building operational systems based on prototypes. 

“43% of respondents indicated that prototypes or similar tools were used for the requirements” 

“39% of the respondents indicated that the ES was a prototype system.” 

Address Most Often Encountered Issues 

All of the known issues with performing V&V on Expert Systems were cited at least once in the survey. A 
small group of issues, however, were cited significantly more often than others and included: 

1. Determining test coverage, 

2. Knowledge validation, 

3. Real-time performance analysis 

4. Complexity of the problem 

The first two issues are well understood and are active research areas. These research areas should be 
matured so that they solutions to these issues can be provided. 

The issue of real-time performance analysis was briefly discussed earlier (see "Summary of Interview Results” 
on page 20). Since this issue may most often be interpreted as the inability to get the expert system to run 
fast enough, and this is not a V&V issue, it is not clear that any recommended action is needed. I lowevcr, it 
did appear from the descriptions of the expert systems, that the ability to predict the response time of the 
system should not be a major issue for current expert systems so it is not felt that any recommendation is 
needed at this time. 

The complexity issue is not as well understood. These is considerable opinion that the types of problems 
addressed by ESs are significantly harder than the problems addressed by conventional software. Others 
maintain the apparent difficulty is attributed to the lack of requirements (see above). In cither case, there 
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does not seem to be a way to approach the complexity issue without considering it in the context of the 
readability and modularity issues, as done in "Address Readability and Modularity Issues” on page 24. 

Recommendations 

1. Develop tools and/or methods to support the determination of test coverage. 

The known issues most often cited as problems were: test coverage determination (50%,75%) ...” 

2. Develop methods and/or tools to support the knowledge validation activity. 

The known issues most often cited as problems were: ... knowledge validation (44% ,75%) ...” 

3. Develop methods and/or tools to assist in managing problem complexity. 

The known issues most often cited as problems were: ... problem complexity (39%, 40%) ...” 

Recommend a Life Cycle for Expert Systems Development 

The most common Life Cycle applied to the development of the ESs included in this survey was the Cyclic 
model. In the Cyclic model, the stages of requirements, design, knowledge base development, and test are 
repeated until the final system is developed. The testing activities at the end of each cycle (except the last) 
lead to the refinement of the requirements that will be used in the successive cycle. Several variations, 
including some with a fixed number of cycles, have been proposed. 

A large number of respondents, however, indicated that no attempt was made to follow any model. If no 
model is being followed, there is little opportunity to apply V&V activities at the appropriate points during 
development. Clearly, any life cycle guidelines would be of benefit in these situations. Multiple life-cycle 
approaches, or a single very flexible life-cycle should be recommended. 

Recommendation 

1. Multiple life cycle models, or a single, very flexible life cycle model should be recommended for develop- 
ment of ESs. (The high incidence of prototypes leading to operational systems suggests that the cyclic 
model should be recommended. Rapid prototyping could be treated as a special case of the cyclic 
model.) 

"The most frequent (40%) Life-Cycle model used is the Cyclic Model ... however, 22% ... stated 
that no model was followed.”. 

“43%. respondents indicated that prototypes or similar tools were used for the requirements” 
“(39%,20%) of the respondents indicated that the ES was a prototype system.” 


Inferred Recommendations 
Address Readability and Modularity Issues 

Readability and modularity were expected to be significant issues, but were not the most frequently cited 
problems. Further analysis of the survey results indicate that the readability and modularity issues may have 
been reported as other problems. This analysis includes the following observations: 

• As often as not, people chose modularity or readability as problems, but not both. This seems to indi- 
cate that many respondents do not see the relationship between the two. 

• Similarly, as often as not, people picked test coverage determination without picking modularity, so the 
apparent relationship between there two issues was not established. 

• The lack of reported relationships between the readability, modularity, and test coverage issues is very 
confusing, implying, for instance, that a rule can be understood but a test scenario for it can not be 
developed. 
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• Readability and complexity of the problem were very rarely chosen together. That is, the developer 
recognizes that the ES was complicated but attributed this complexity either to the problem or to the 
solution, but not both. It is questionable that the complexity of the problem and the complexity of the 
solution can be easily distinguished. (The emergence of Object-oriented programming languages is due, 
in part, to the claim that conventional languages cause programming complexities which are erroneously 
attributed to problem complexity.) 

If the number of times each of these issues were reported are added together, the collection of issues becomes 
a very frequently cited problem. Since these issues are so closely interrelated, they should be addressed as a 
single issue. Therefore, the problem of reducing overall complexity (problem/solution) is a very important 
issue. 

Recommendation 

1. Develop methods and/or tools to support the readability, modularity, and problem complexity issue. 


Address Configuration Management Issue 

Configuration management was an infrequently cited problem. However, the survey results also show that 
in practice the applied CM, while sometimes quite good, was generally poor (changes to the knowledge base 
were not well managed). This contradiction is probably due to the high frequency of prototypes and In 
development' responses to the survey. While there are certain applications for which C\1 may never be a 
significant issue, certainly there are applications for which CM is a very important issue. 

Recommendation 

1. Identify the differences between CM of conventional software systems and CM of expert systems. It is 
not immediately obvious that there are differences. 


Develop Criteria to Classify Expert Systems by Intended Use 

The survey results indicate that there is a very diverse set of applications which are utilizing ES technology. 
At least the following types of applications exist: 

Expert Clone 

Provides expert assistance to a human user. The expert is usually available if the ES does not 
provide the correct results. The major uses of this type of include: education and capture of true 
institutional knowledge. 

Expert Assistant 

Allows the user, typically an expert, to concentrate on the more important aspects of the task. 
These ESs typically serve as filtering mechanisms. 

Autonomous 

Limited supervision is applied to the ES. In additional to providing filtering, these systems typi- 
cally develop and execute plans to handle situations. 

A subcategory of Autonomous ESs are time critical ESs. These ESs exist primarily because 
experts can not interpret data efficiently enough to perform the task in the allotted time. 

Self-modifying autonomous 

Part of the planned execution is to modify its knowledge base to respond to certain situational 
data. The application of V&V to this type of problem is currently uncertain. 

Traditional Software Problem 

Some conventional problems (e g. discrete event simulation), arc more conveniently imple- 
mented using expert system shells 

It is apparent that because of this diversity, a single set of V&V requirements is probably undesirable. 
Development of classification criteria allows a simplification of ES V&V requirements. In addition to sim- 
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plification, classification allows the development of requirements to be concentrated on the types of applica- 
tions of interest. — ; 

Recommendations 

1. Develop classification criteria to distinguish among expert systems which require different V&V 
approaches. 

2. Concentrate initial V&V requirements definition effort on autonomous systems, since these systems are 

likely the most critical. ' 

Investigate Applicability of Analysis Tools 

A very large number of respondents indicated that experts were the primary source of requirements and ver- 
ification. Several of the previous recommendations would reduce this dependence, but there is a class of 
expert system applications for which expert consultation will continue to be the leading source. 

Recommendations 

1. Determine if a there is a communication problem between the experts and the knowledge engineers / 
expert system developers. 

2. If a communication problem exists, investigate the possibility of representing Knowledge Base in a form 
that domain experts can easily, yet accurately, understand. 
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Appendix A. Detailed results 


The following table represents the raw data from the survey of expert system developers. Except for 
questions number 1 and 4 1 there is a column in the table for each question in the survey. The column 
headers have a number in parentheses corresponding to the question number in the survey. There is also a 
short mnemonic representing the subject of the question to facilitate cross reference to the correct survey 
question. 


Summary of Developers Responses (part 1) 



-S 1 Answers to questions 1 and 4 are not piovidcd because these would identify survey respondent. 


m 
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Summary of Developers Responses (part 2) 
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Summary of Users Responses 
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Appendix B. Expert Systems Evaluation Questionnaire 
(Developer) 


By filling out this NASA funded questionnaire, you can help define the state-of-the-practice in the formal 
evaluation of Expert Systems on current NASA and industry applications. The information that you 
provide will be merged with the information from all other surveyed projects for the purpose of recom- 
mending future research and development activities. Individual responses are used solely as input to this 
information merging process. Each survey participant will be sent a copy of the final survey results. 


Expert System applications are becoming more prevalent in fields where proper functioning is essential, such 
as the aerospace, medical, and financial industries. It is widely claimed that Expert Systems are not as rigor- 
ously evaluated as traditional software because of unique, unresolved evaluation issues. To ensure the con- 
tinued and safe deployment of Expert Systems into critical areas, adequate evaluation techniques which 
address these issues must be developed and performed. 


Instructions 

The following questions concern your experiences with an Expert System, either as a developer or as the 
manager of the development effort. Feel free to indicate your answers in any way you like. Some of the 
choices on the multiple choice questions have places to fill in additional information; please indicate the 
choice and include the additional information, if possible. If you have any comments about the questions or 
your answers, please write them in the left margin. 

Analysis of the responses may indicate that further discussion is required for complete understanding of the 
issues encountered during the evaluation process. Discussions will be held either as short one-on-one 
meetings or by telephone. Would you be available, at your convenience, to discuss the evaluation process in 
more detail? 

Yes I am available for discussions. 

Name 

Phone 

No lam not available for discussions. 


If you have any questions regarding this questionnaire, please contact Keith Kelley at (713) 282-7303. If 
possible, please return completed questionnaires within one week of receipt to: 

Keith Kelley 
MC 6606 

IBM Federal Sector Division 
3700 Bay Area Blvd. 

Houston, Tx. 77058-1199 
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Definitions 

Certainty factors 

Some problems require the use of certainty factors (also called probabilities’ or fuzzy logic) in 
their processing. Facts which contain certainty factors have the form: “if a is true, then there is 
an x% chance that b is true.’’ 

Expert 

The person who provides the knowledge that is to be captured in the Expert System. 

Inference engine 

Processes the knowledge structures to infer a set of output facts from a set of input facts. Exam- 
ples of commercial systems are CLIPS and ESE. 

Interface code 

Used to supplement the inference process. Examples are interfacing the inference engine to a 
device, and performing arithmetic calculations. 

Knowledge structures 

Declarative part of the Expert System which represents the knowledge (typically called the 
Knowledge Base). Examples are frames and rules. 

Problem space 

The total number of cases which could potentially be addressed by the Expert System. 

Problem space coverage 

The percentage of the problem space that is addressed by the Expert System. For example, if the 
Expert System is supposed to be able to diagnose 100 malfunctions, but the total number of 
malfunctions is known to be 200, the problem space coverage is 50%. 

Questions 

1. What is the name of the Expert System you were/are involved with? 


2. Were you a developer of the Expert System or the manager of the development organization? 

a. Developer of Expert System 

b. Manager of Expert System development organization 

c. Other 


3. Is the Expert System operational or is it a prototype? 

a. Operational system b. Prototype system 

4. Briefly describe what the expert system does. 
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5. What field does the problem belong to? 


a. 

Aerospace 

g. 

Medical 

b. 

Financial 

h. 

Personnel 

c. 

Information Systems 

i. 

Research 

d. 

Hardware 

j- 

Service 

e. 

Manufacturing 

k. 

Software 

f. 

Marketing 

1. 

Other 


6* Which of the following items best describes the kind of problem the Expert System addresses? Please 
indicate primary purpose with a and check all other applicable purposes (if any). 

a. Design - Configuring objects under constraints 

b. Repair - Executing plans to administer prescribed remedies 

c. Control - Governing overall system behavior 

d. Planning - Designing actions 

e. Diagnosis - Inferring system malfunctions from observables 

f. Debugging - Prescribing remedies for malfunctions 

g* Prediction * Inferring likely consequences of given situations 

h. Monitoring - Comparing observations to expected outcomes 

i. Instruction - Diagnosing, debugging, and repairing behavior 

j. Interpretation - Inferring situation descriptions from sensor 

k. Classification * Categorizing objects by properties data 

7. Does the Expert System include certainty factors? 

a - Yes c. I don't know 

b. No 

8. How much of the problem space is the Expert System expected to cover? 


a. 100% 

f. 

60% to 80% 

b. > 99% 

g- 

40% to 60% 

c. 95% to 99% 

h. 

Other % 

d. 90% to 95% 

i. 

I don't know 

e. 80% to 90% 



What is your estimate of the problem space coverage actually provided by the Expert System? 

a. Same as expected 

f. 

80% to 90% 

b, 100% 

g- 

60% to 80% 

c. > 99% . 

h. 

40% to 60% 

d. 95% to 99% 

i. 

Other % 

e. 90% to 95% 

i- 

I don't know 


Questions 10 through 12 are concerned with the percentage of problems within the problem space (covered 
by the Expert System) that are answered correctly. 

10. If human experts currently perform (or previously performed) the task, how often is the cxpcrt(s) 
expected to give the correct answer? 


a. 

Task not performed by human 

f. 

80% to 90% 

b. 

'Correct' defined by expert 

g- 

60% to 80% 

c. 

> 99% 

h. 

40% to 60% 

d. 

95% to 99% 

i. 

Other 

e. 

90% to 95% 

i 

I don't know 
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1 1. How often is the Expert System expected to provide the correct answer? 


_ 

a. 

100% 

f. 

60% to 80% 


b. 

> 99% 

g- 

40% to 60% 


c. 

95% to 99% 

h. 

Other % 


d. 

90% to 95% 

i. 

I don't know 


e. 

80% to 90% 




12. What is your estimate of how often the Expert System actually provides the correct answer? 


a. 

100% 

f. 

60% to 80% 

_ 

b. 

> 99% 

g- 

40% to 60% 

- 

c. 

95% to 99% 

h. 

Other % 

— 

d. 

90% to 95% 

i. 

I don't know 


e. 

80% to 90% 




13. 


What was the basis for determining how the system was to behave? Please indicate the primary basis 
with a and check all other applicable basis (if any). 


a. 

b. 

c. 

d. 

e. 

f. 


A pre-existing document 


A requirements document completed as part of development. 

Some other developed document 

A prototype of the system 

Expert consultation 

Other 


Er? 


14. How difficult was it to develop the original concept of what the system was supposed to do? 

a. Trivial d. Hard 

b. Easy e. Impossible 


c. 


Medium 


15. Was more than one expert consulted during the development of the system? 

a. System was developed by expert d. Committee of experts 

b. Single expert e. Other 

c. Multiple experts with lead 

16. If more than one expert was available for consulting, how often did the experts agree on what results 
the Expert System was supposed to provide? 


a. A single expert was involved 

b. Always agree 


c. 


Agree , 


% of the time. 


17. If the system was not developed by the expert, how much interaction was there between the expcri(s) 
and the development team? 



a. 

System was developed by expert 

d. 

Regular 


b. 

Constant 

c. 

Occasional 


c. 

Frequent 

f. 

None 
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18. Was the developers) part of the user organization? 

a. Yes 

b. No 


Some developers were in the user organiza- 
tion 


19. Please indicate which development model was used for developing the Expert System. 

a. Requirements gathering preceded Design, Implementation, and Test (Traditional waterfall life- 
cycle). 

b. Requirements gathered before development of a prototype. A second requirements activity pre 
ceded Design, Implementation, and Test. 

c. Repetition of the Requirements, Design, Rule Generation, and Prototyping phases until pro- 
duction system (final prototype) was developed. 

d. No effort was made to follow a particular model. 

e. Other 


20. What was the primary language/tool for each part of the Expert System? 

a. Knowledge Structures 

b. Inference Engine 

c. Interface Code 


21. What percentage of the total development effort was dedicated to each part of the Expert System? 

a. Knowledge Structures % 

b. Inference Engine % (If an Expert System Shell was used, this value should be 0%.) 

c. Interface Code % 


22. Since Knowledge Bases can be written using several type of Knowledge Structures, please indicate how 
many of the following structures were used. If another type of structure was used, please describe it 
and how many were used. 


a. 

Rules 

d. 

Parameters 


b. 

Frames 

e. 

Statements 


c. 

Facts 

f. 

Other (#) 

of 


23. Aside from any difficulties in developing the original concept, how difficult was it to express the 
behavior (through the Knowledge Structures) of the expert? 


a. 

Trivial d. 

Hard 

b. 

Easy e. 

Impossible 

c. 

Medium 


When changes were made to the knowledge structures, how often did some unexpected result occur? 

a. 

Never d. 

Usually 

b. 

Occasionally e. 

Always 

c. 

Frequently 
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Questions 25 through 28 are concerned with the evaluation activities performed during development. 

25. What evaluation activities were performed on the knowledge Structures? (indicate any that apply) 



a. 

No evaluation was performed 

d. 

Checked by expert(s) 


b. 

Desk checking 

e. 

Structural testing (e.g. cover all rules) 


c. 

Formal inspections 

f. 

Other 

26. 

What evaluation activities were performed 

on the Inference Engine? (indicate any that apply) 


a. 

No evaluation was performed 

d. 

Structural testing 


b. 

Desk checking 

e. 

Other 


c. 

Formal inspections 



27. 

What evaluation activities were performed 

on the Interface Code? (indicate any that apply) 


a. 

No evaluation was performed 

d. 

Structural testing (branch or path) 


b. 

Desk checking 

e. 

Other 


c. 

Formal inspections 



28. 

What testing activities were performed on i 

the executing system? (indicate any that apply) 


a. 

No evaluation was performed 

d. 

Structural testing (e.g. cover all rules) 


b. 

Checked by expert(s) 

e. 

Other 


c. 

Compared with expected results 




29. How much effort was expended in developing the system, including evaluation activities performed by 
the developers? person/months. 


30. How much of the development effort was spent on evaluation? %. 

31. Did a separate organization evaluate the Expert System before it was delivered to the users? 

a. Yes, there was a separate evaluation organ- b. No, there was not a separate evaluation 
ization. organization. 

32. If there was a separate evaluation team, how much effort was expended by the team in evaluating the 

correctness of the Expert System? person/months. 


33. What testing activities were performed on the executing system before the system was delivered to the 
users? (indicate any that apply) 


a. 

No evaluation was performed 

d. 

User acceptance 

b. 

Checked by expert(s) 

e. 

System run in parallel 

c. 

Compared with expected results 

f. 

Other 
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34. How were changes to the Expert System distributed to the users? 


a. 

User updated system at developer's direction 

- 

b. 

Developers made changes to users' system 


c. 

Untested system distributed to users 

9 

d. 

Tested system distributed to the users 


e. 

Configuration management group distributes system 

— 

f. 

Other 



35. 

Compared to conventional software testing efforts, how difficult was the evaluation of the Expert 



System? 





a. 

Trivial 

d. 

Hard 

m 


b. 

Easy 

e. 

Impossible 



c. 

Medium 

f. 

No evaluation’ was done 

l 

36. 

Many people feel that some development issues are more of a problem with Expert Systems than with 

m 


conventional systems. Which (if any) of the following were problems during implementation or test of 




this Expert System? 





a. 

Understandability and readability of knowledge structures 



b. 

Determining test coverage for knowledge structures 




c. 

Modularity/ Design of knowledge structures 





d. 

Knowledge validation 





e. 

Analysis of Certainty Factors 





f. 

Validating the inference engine 





g- 

Real-time performance analysis 





h. 

Complexity of the Problem 





i. 

Certification 





j- 

Configuration Management 



9 


k. 

Other 




37. 

How reliable is the Expert System required to be? 



W 


a. 

Trusted with human life 

d. 

Assists the expert 



b. 

Trusted with mission objectives 

e. 

Assists the user 

m 


c. 

As reliable as the expert 

f. 

Other 


38. 

Does the Expert System seem to be more reliable or less reliable than conventional systems that are in 

■ 


use? 






a. 

Significantly more reliable 

f. 

Less reliable 



b. 

More reliable 

g- 

Significantly less reliable 



c. 

Slightly more reliable 

h. 

No comparison is available 

-- - 


d. 

Similar reliability 

i. 

I don't know 

—■3 


e* 

Slightly less reliable 




39. 

How 

many people are expected to make use of the 

Expert 

System? 

m 
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40. How frequently are the (expected) users actually using the system? (Numbers may add up to more 
than 100% if the actual number of users is greater than the expected users.) 

% use the system more than expected 

b. % use the system about as much as expected 

c * % use the system less than expected 

d. % do not use the system 
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Appendix C. Expert Systems Evaluation Questionnaire (User) 

By filling out this NASA funded questionnaire, you can help define the state-of-the-practice in the formal 
evaluation of Expert Systems on current NASA and industry applications. The information that you 
provide will be merged with the information from all other surveyed projects for the purpose of recom- 
mending future research and development activities. Individual responses are used solely as input to this 
information merging process. Each survey participant will be sent a copy of the final survey results. 

Expert System applications are becoming more prevalent in fields where proper functioning is essential, such 
as the aerospace, medical, and financial industries. It is widely claimed that Expert Systems are not as rigor- 
ously evaluated as traditional software because of unique, unresolved evaluation issues. To ensure the con- 
tinued and safe deployment of Expert Systems into critical areas, adequate evaluation techniques which 
address these issues must be developed and performed. 

Instructions 

The following questions concern your experiences with an Expert System, either as a user or as the manager 
of a department that uses Expert System. Feel free to indicate your answers in any way you like. Some of 
the choices on the multiple choice questions have places to fill in additional information; please indicate the 
choice and include the additional information, if possible. If you have any comments about the questions or 
your answers, please write them in the left margin. 

Analysis of the responses may indicate that further discussion is required for complete understanding of the 
issues encountered during the evaluation process. Discussions will be held either as short one-on-one 
meetings or by telephone. Would you be available, at your convenience, to discuss the evaluation process in 
more detail? 

Yes I am available for discussions. 

Name 

Phone 

No I am not available for discussions. 

If you have any questions regarding this questionnaire, please contact Keith Kelley at (713) 282-7303. If 
possible, please return completed questionnaires within one week of receipt to: 

Keith Kelley 
MC 6606 

IBM Federal Sector Division 
3700 Bay Area Blvd.. 

Houston, Tx. 77058-1199 

Definitions 

Expert 

The person who provides the knowledge that is to be captured in the Expert System. 

Inference engine 

Processes the knowledge structures to infer a set of output facts from a set of input facts. Exam- 
ples of commercial systems are CLIPS and ESE. 

Knowledge structures 

Declarative part of the Expert System which represents the knowledge (typically called the 
Knowledge Base). Examples are frames and rules. 
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Problem space 

The total number of cases which could potentially be addressed by the Expert System. 

Problem space coverage 

The percentage of the problem space that is addressed by the Expert System. For example, if the 
Expert System is supposed to be able to diagnose 100 malfunctions, but the total number of 
malfunctions is known to be 200, the problem space coverage is 50%. 

Questions 

1. What is the name of the Expert System you were/are involved with? 


2. Are you a user of the Expert System or the manager of a department which uses the Expert System? 

a. User of the Expert System 

b. Manager of a department using the Expert System 

c. Other __ 

3. Is the Expert System operational or is it a prototype? 

a. Operational system b. Prototype system 

4. Briefly describe what the expert system does. 


5. What field does the problem belong to? 


a. 

Aerospace 

g- 

Medical 

b. 

Financial 

h. 

Personnel 

c. 

Information Systems 

i. 

Research 

d. 

Hardware 

j- 

Service 

e. 

Manufacturing 

k. 

Software 

f. 

Marketing 

1 . 

Other 
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6. Which of the following items best describes the kind of problem the Expert System addresses? Please 
indicate primary purpose with a and check all other applicable purposes (if any). 

a. Design - Configuring objects under constraints 

b. Repair * Executing plans to administer prescribed remedies 

c. Control - Governing overall system behavior 

d. Planning - Designing actions 

e. Diagnosis - Inferring system malfunctions from observables 

f. Debugging - Prescribing remedies for malfunctions 

g. Prediction - Inferring likely consequences of given situations 

h. Monitoring - Comparing observations to expected outcomes 

i. Instruction - Diagnosing, debugging, and repairing behavior 

j. Interpretation - Inferring situation descriptions from sensor data 

k. Classification - Categorizing objects by properties 

7. How much of the problem space is the Expert System expected to cover? 


a. 

100% 

f. 

60% to 80% 

b. 

> 99% 

g- 

40% to 60% 

c. 

95% to 99% 

h. 

Other % 

d. 

90% to 95% 

i. 

I don't know 

e. 80% to 90% 

What is your estimate of the problem space coverage 

actually provided by the Expert System? 

a. 

Same as expected 

f. 

80% to 90% 

b. 

100% 

g- 

60% to 80% 

c. 

> 99% 

h. 

40% to 60% 

d. 

95% to 99% 

i. 

Other % 

e. 

90% to 95% 

i- 

I don't know 


Questions 9 through 1 1 are concerned with the percentage of problems within the problem space (covered by 
the Expert System) that are answered correctly. 

9. If human experts currently perform (or previously performed) the task, how often is the expcrt(s) 
expected to give the correct answer? 


a. 

Task not performed by human 

f. 

80% to 90% 

b. 

'Correct' defined by expert 

g- 

60% to 80% 

c. 

> 99% 

h. 

40% to 60% 

d. 

95% to 99% 

i. 

Other 

e. 

90% to 95% 

j- 

I don't know 

How often is the Expert System expected to provide the correct answer? 

a. 

100% 

f. 

60% to 80% 

b. 

> 99% 

g- 

40% to 60% 

c. 

95% to 99% 

h. 

Other 

d. 

90% to 95% 

i. 

I don't know 

e. 

80% to 90% 
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11. What is your estimate of how often the Expert System actually provides the correct answer? 



a. 

100% 

f. 

60% to 80% 


b. 

> 99% 

g* 

40% to 60% 


c. 

95% to 99% 

h. 

Other % 


d. 

90% to 95% 

i. 

I don't know 


e. 

80% to 90% 



12. 

Was the expert(s) a member of the user organization? 




a. 

Yes 

c. 

User organization provided some expertise 


b. 

No 



13. 

Was the developers) of the Expert System part of the user organization? 


a. 

Yes 

c. 

Some development provided by user organ 


b. 

No 


ization 

14. 

Why do you believe the results that the system gives? 




a. 

Expert says it is correct 

e. 

User acceptance 


b. 

Participated in evaluation 

f. 

I don't trust the results 


c. 

Someone I trust did evaluation 

g- 

Other 


d. 

Personal use and checking 



15. 

How reliable is the Expert System required to be? 




a. 

Trusted with human life 

d. 

Assists the expert 


b. 

Trusted with mission objectives 

e. 

Assists the user 


c. 

As reliable as the expert 

f. 

Other 

16. 

Does the Expert System seem to be more reliable or less reliable than conventional systems that are in 


use? 





a. 

Significantly more reliable 

f. 

Less reliable 


b. 

More reliable 

g- 

Significantly less reliable 


c. 

Slightly more reliable 

h. 

No comparison is available 


d. 

Similar reliability 

i. 

I don't know 


e. 

Slightly less reliable 




17. How many people are expected to make use of the Expert System? 

18. How frequently are the (expected) users actually using the system? (Numbers may add up to more 
than 100% if the actual number of users is greater than the expected users.) 

a. % use the system more than expected 

b. % use the system about as much as expected 

c. % use the system less than expected 

d. % do not use the system 
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If you were not involved with evaluating the Expert System, please leave the remaining questions unan- 
swered. 

19. How much effort was expended by the evaluation team in evaluating the correctness of the Expert 
System? person/months. 


20. What testing activities were performed on the executing system before the system was delivered to the 
users? (indicate any that apply) 


a. No evaluation was performed 

d. 

User acceptance 

b. Checked by expert(s) 

e. 

System run in parallel 

c. Compared with expected results 

f. 

Other 

If more than one expert was available for consulting, how 
the Expert System is supposed to provide? 

often did the experts agree on what results 

a. No expert was involved 

c. 

Always agree 

b. A single expert was involved 

d. 

Agree % of the time. 

Compared to conventional software testing efforts, how difficult was the evaluation of the Expert 
System? 

a. Trivial 

d. 

Hard 

b. Easy 

c. Medium 

e. 

Impossible 


23. Many people feel that some development issues are more of a problem with Expert Systems than with 
conventional systems. Which (if any) of the following were problems during testing of the Expert 
System? 

a. Understandability and readability of knowledge structures 

b. Determining test coverage for knowledge structures 

c. Modularity/ Design of knowledge structures 

d. Knowledge validation 

e. Analysis of Certainty Factors 

f. Validating the inference engines 

g. Real-time performance analysis 

h. Complexity of the Problem 

i. Certification 

j . Other 
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